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ABSTRACT 

This paper discusses Bayesian m-group regression 
where the groups are arranged in a two-way layout into m rows and n 
columns, there still being a regression of y on the x«s within each 
group. The mathematical model is then provided as applied to the case 
where the rows correspond to high schools and the columns to 
colleges: the predictor variables might be the performances in 
various course area taken while at high school and the random 
variable the first-year performance at college. Then "cell" (i,3) of 
the two-way table will contain data on the performances of those 
subjects who passed from high school i to college j. Eighteen 
equations are given. (DB) 
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MULTIPLE REGRESSION IN A TWO-WAY LAYOUT 



by 

Dennis V. Lindley 
University College London 

Suppose that in each of a number, m, of groups a random variable y has 
a linear regression on a set of predictor variables Xj^, Xj, x^ . A 

Bayesian approach to the estimation of the m regression lines, valid under 
certain assumptions of exchangeability, has been given by Lindley (1969, 1970). 
The theory has been described as part of the Bayesian analysis of the genera^ 
linear model by Lindley and Smith (1972)— see, in particular, Section 3.2, 
The basic ideas have been extensively developed and put into an operational 
form by Jackson, Novick, and Thayer (1971) under the name of m-group regression 
and implemented in a major application by Novick, Jones, and Cole (1972). 
Technical details concerning the computer program are described by Jones and 
Novick (1972). 

The present paper extends these ideas to the case where the groups, now 
m times n in all, are arranged in a two-way layout into m rows and n columns, 
these still being a regression of y on the x*8 within each group. The sort 
cf application we have in mind is where the rows correspond to high schools 
and the columns to colleges: the predictor variables might be the performances 
in various course areas taken while at high school and the random variable 
the first-year performance at college. Then "cell" (i, J) of the two-way 
table will contain data on the performances of those subjects who passed from 
high school i to college J . In the case of m-group regression, the estimation 
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of any single regression can be Improved by tising all the data and not just 
that from the group under consideration. Our aim is to effect similar 
improvements in the case of the two-way layout. 

The situation where c^ere are no predictor variables is a non-trivial 
special case. There, in our example, we would merely have the college score, 
and the familiar analysis-of-variance type of situation arises in which 
we try to separate out the variation into components due to schools (rows) 
and colleges (columns). This has been discussed by Lindley (1972). It turns 
out that the regression case is a straighforward multivariate generalization 
of the special situation. Essentially, all that happens is that the scalar 
equations for the special case become vector and matrix results in the general 
problem. The reader is, therefore, advised, to read the simpler, earlisr 
paper first. Having understood the basic ideas in the scalar context, the 
difficulties in the present, multivariate case should be substantially 
reduced. 

The data are supposed to be generated by a model in which 

P 

E(y^., ) ^ Z B.. X. (1) 
^^ijk' s=l ^^^^ 

for i - 1, 2, m: j - 1, 2, n; k = 1, 2, r.. . There, y is 

the random variable having linear regression on p variables x^ , x^, x^ . 

The suffix i refers to the row, j to the column, s to the predictor variable, 

and k to the replicate number within a cell. Hence, 6^^^^ is the regression 

coefficient of y on in the group that is in the i^^ row and j^^ column. 

It will further be supposed that the distribution of the y-values, 

conditional on the x-values and the regression coefficients, is normal; that 

2 

they are all independent and have constant variance o . It is possible to 
relax this last condition and have the variances possibly different from 
cell to cell. This was done in the earlier papers referred to, but the 



work of Jackson, et al. (1971) suggests thaf the practical effect of such 
a generalization is minimal. Since it complicates the algebra, which is 
already formidable enough, we have not dealt with the heteroscedastic case 
heve. The reader who wishes to study it will find the ideas needed 
discussed in Appendix 4 of Lindley (1972). This paper will subsequently be 
referred to as L . 

In most applications, one of the predictor variables, say x^, will be 
a constant, say one, to allow for a constant term in (1). If so, that 
equation may be written 

If p = 1, we have simply ^(y^^^^) = the third suffix for 0 being 

redundant, and the situation is exactly that discussed in L . The prior 
structure assigned to the mn values 9^^ was to suppose 

where " N(0, o^) , 6^ " N(0, oj) , y^j ^ N(0, a^) , all these distributions 
being independent, an^ the prior knowledge of p to be vague. The reasons 
given for this assumption were stated at length in L, but essentially, it 
rollows from supposed exchangeability between rows and between columns plus 
assumptions of additivity and normality. 
Now, we can equally t^rite (1) as 



where 6., is a vector given by 



?ij ^ ^hiV hi2 ^ijp^ • 



T 
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the vector of regression coefficients from cell (i, j), and x^.^ a similar 
vector with typical element ^^jj^g • ^^^^ assumption of ,/rior 

structure about the vector 0^^ that we made about the s calar 0.^ . 
Specifically, we assume 



= y + + 6. + y,. 



(3) 



where a, - N (0, E ) , 6 . - N^(0, Z,), y.. ~ N (0, Z ), all these distributions 
-1 p - -a --j p- -ij p - -c 

being independent and the prior knowledge of y being vague. (There, means 
multivariate, p-dlmenslonal normal, and the Z*s are the dispersion matrices.) 
Essentially, this means that the set of regression coefficients in cell (i, 1) 
has possible effects (a^) due to the row, to the column (g^), and to an 
interaction between row and column (Y^j)* The remarks in L about the restrictive 
nature of this assumption and the necessity of checking its reasonable 
validity before using the methods developed with it as a basis, apply with 
even more force in this multivariate case. 

The form (3) Implies the following covarlance structure for the vectors 

n » 
w , , . 

cov(e.,, e'^ ) = 0 , i ^ r, 3 ^ s , (4a) 

-ij -rs 

cov(0^., e^g) = Z^. i i< s , (4b) 

cov(e, ., Q^.) = Z , ' 1 ^ r , (4c) 
-Ij -rj ~b 

and we alternatively express our prior structure by saying that the vectors 

have means p and dispersions described by equations (4). In the language 
of the general linear model developed by Lindley and Smith (1972), the sccoml 
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Stage has a dispersion matrix whose elements are described by (4): these 
elements themselves being matrices. To f-nd our estimates of 9^ ^ , we use 
the general theory in the way described in Appendix 1 of L . To do this, we 
have first to invert . This is most easily done by solving the linear 
equations in z, C^z = a . These equations* are easily seen to be [cf. (1.3) 
of L] 

E 2, , + nZ 2, + mZ, 2 . a.. . (5) 
-c-ij -a^i* -b-'j -ij 

For any array u^^ , vector or scalar, write 

u* = u . , - u . - u . + u , 
ij ij i* n 



u; = u^ . - u , 



u' - u , - u 

•J -j 



u* - u 



Then, (5) raay be rewritten 

V 2! . + V 2! + V 2* . + V 2'. =^ a. , 



where 



V « I + nl + ml, 
-mn *c ^-a 



(6) 



V « E , V « Z + nZ , V = Z + mZ, , \ 
-o -c -n "C "a *m "c j 

and V 



ft ■ 

f *Notice that in writing out these and similar equations in mnp unknowns, we 

W have written them as mn equations each involving p-vectors. In this way, their 

W 

M J 

W: structure is most easily understood and related to the scalar forms in L . 



The summation of (6) over all i, j immediately gives V z* = a • 

*mn — - - - » 



over j alone gives V z' = a!; over i alone gives V z'. = a'., and hence, 

substituting in (6) V^z'. = a^^ . The solution to (^^^z = a is , therefore,* 

= '^'}^\ - + v'-'-al + V"^a'. + v'-'-a' . IK\ 

-ij -o ~ij -n -i' -m -.j -mn~** 



The coefficients of a^^ (not a^^) on the right-hand side of this equation 
are the elements in . If the elements of this matrix are termed the 
coprecisions. and abbreviated cop, we have [cf. equations (4) above] 

cop(e.., 0^^) = H^, i r, j ^ s . (9a) 

^°P^?ij' ?Is> ?a ?o' j « ' (9b) 

cop(e.., e^.) = +H^. i/r. (9c) 

cop(e.^, e^.) = H, +H3+ll^ + H^ (9d) 



where 



and 



mnH^ = V"-^ - V'-"- - v""^ + V^"^ . nOa^ 

-o -o -n -m -mn ' v^^ua; 

"«a = - Yo^ + Yn^ ' (10b) 

"•^^b = - Yo^ Y;^ . (10c) 

H = V'^ • (lOd) 
'C -^o 

Having found C^^, we need to evaluate 



*The reader can check that with p * 1, this agrees with the result (I. 10) in 
L . The new form given here is more convenient in the matrix case. 
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where A2 is a matrix of mnp rows and p columns all, mn, of whose t x p 

submatrices are unit matrices. This last fact follows since every ^.^^ has 

the same expectation, y • It is easy to verify that the column totals for 

the submatrices in C^^^ are all and that from this it follows that (11) 

is a square matrix of dimension mnp all of whose p x p submatrices are 

(ranV . This completes the calculations needed at the second stage of 

^mn 

the model. 

The first stage of the model is described by (2), and in the notation 

of the general linear model, this is given in terms of matrix of a form 

with matrices X^^ down the "diagonal" and zeros elsewhere, the matrix X. . 

having r rows and p columns with typical (k, s) element . The 

2 

corresponding dispersion matrix, C^^, is simply a times ^ unit matrix, and 
T — 1 

SO the matrix A^^C^^ A^^ that occurs in the usual normal equations has diagonal 

T -2 T . 

submatrices X. .X o , all of size p x p, and zeros elsewhere. ?£^?^i? 

simply the matrix of sums of squares and products of the x-values occurring 

in cell (i, j); the typical (s, t) element being ^^t. ^j^^x. ^^^^ . Similarly, 

on the other side of the normal equation, we shall have terms -'^^i^j^^i^i^s * 

These produce a column vector consisting of subvectors, a typical one of which 
-2 

We are now in a position to write down the equations satisfied by the 

2 

m;)dal estimates of 0.^ assuming F. , , E , and a to be known [cf. equations 

ijs "a ""D ""C 

(3.1) of L]. In writing these out, we should distinguish between and the 

estimate 9*. , say, but to avoid even more complicated notation, tiic asterisk 
ijs 

will be omitted. The equations* are 



is X..j^^j0 where is the vector of elements y^^j^ 



+ [innH - v"-^]e = xJ,y,,o"^ . (!.'') 
' -o -mn ~" 
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*In the notation of Appendix 7 of L, they are D 0^^ = d . 
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[The corresponding equations in the scalar case are (3.1) in L . ] This is 
the main result of the paper since it provides the adjusted estimates of the 
regression coefficients that replace the usual least-squares estimates. 
The latter would be the solution of the normal equations 

T T 

obtained from (12) by omitting all the H-matrices and V^^ . In the normal 
equations, the regression coefficients in cell (i, j) are not involved with the 
coefficients in any other cell, whereas all occur in (12). The basic equations 
may be rewritten in terms of the original matrices Z^, E^, and using 
equations (10) and (7). The result is 



c -c -a -c -D 



To complete the analysis, it only remains to provide eouations for the 

estimation of E , Z , and . To do this, we return to the structure of 
a - D - c 

the vectors of regression coefficients, e^_. , described in (3), resulting in 

the covariance values given in (4). In the last of these results, (4d), the 

three dispersion matrices occur in combination, and any attempt to base an 

estimation procedure on this has difficulty in separating the components. This 

can be seen again in (13) where, for example, I always occurs in association with 

" a 

I . To circumvent this difficulty, we redescribe the model in terms of an 

*'C 

extra stage, and suppose, firstly that 6^^ - ^^(a^ + § ^ » E^) and then that both 

a. - N (u , £ ) and 6, - N (u, , £,) with vague prior knowledge of ;j and u. . 
-1 p -a -a p^b or -a -b 



Clearly, this is an alternative way of expressing (3). With this model, we 
can write down the joint probability distribution of all the quantities in 
the problem. After integration over the diffuse priors for and Pj^, this 
is easily seen to be proportional to (where R - 



-'-imn 



exp 



-Jl(m-l). 



exp 



-h{n-l) 



exp 



i 



- - B.)V^6 - S.) 



(14) 



In explanation, the first ..line describes the likelihood of the data, 
the second the distribution of the regression coefficients given the row and 
column values (essentially, this is the interaction term), and the last two 
li-^os provide the distributions of the row and column values, after removal 
of their means by integration. 

This is, by Bayes theorem, proportional to the posterior distribution of 
(0..). (a.), and (BJ~if the dispersion are known. The modal values of the 
e's have been found as the solutions to equations (12) or (13). '*'e now 
determine the similar estimates of the a's and B's. To do this, differentiae 
the logarithm of (14) with respect to and with the results 

nr"^(e. - a - B ) + K^(oi. - a.) = 0 
-c -i* "1 -a *i 



and 



(15) 
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Summation of the first (say) set over i gives a. + 6. = 6.. and, hence, 
eliminating 6. from a member of that set, we have 

•nZ"-^(e. - e ) = (nE"-^ + Z"S(a. - a.) 
- 1. ^ • • -c 1 



or 



Similarly, 



(16) 



Since the estimates of the O's are known, these equations determine the 
estimates of the a*s and §'s. 

2 

To complete the analysis, we have to insert prior distribution for a , 

E . >: . and I . We suppose that these are independent, and for there is no 
-a -b ^c 

objection to using the usual vague prior proportional to o . With the 

three dispersion matrices, one cannot be so cavalier. In default of any 

better idea, we assume that l"^ has a Wishart distribution with degrees of 

freedom v and matrix A (u = a, b, c). Specifically, ij^ has density 
u -u 

proportional to 



A 






~a 




-a 



^(Va-p-l) r _i - 

exp[- ^v^tr(A^E^ ) 



(17) 



with l'^ and e"-^, similarly. The choice of values for the u's and A's will 
-b -c 

be discussed below. 

The procedure now is to combine the prior distribution, exemplified bv 
(17), with the other terms In (U) so obtaining the joint posterior distribution 
for which the modal values can be determined by differentiation and oquatinR 
the results to zero. The analysis closely parallels that given bv iJndlt 
and Smith (1972, Section 5.2) and the results are that 
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-c c-c , , "ij "1 -J "ij ^- jf 
5b " ''Vb ^ ' --^^-i " ^-^^JA"" + - p - 2) (18c) 

^ 1. 



) ; (mn + - p - 1) , 



(18b) 



and 

■ 2) . (I8d) 



(in this set of equations, as in (12), the estimates—for example O^j—shmld 
strictly replace the values like 6^^ written there.) 

To perform the analysis, values have to be specified for the v's and A's. 
Since measures ones prior precision about Z^, it is natural to take as 
small as possible. The least value compatible with the convergence of the 
prior distributions in (17) is = p, and we sugges. taking this value. 



There remains the choice of . 



In the form we have written (17), A,"^ is the expectation of T.^ so that 

A can be thought of as ones prior opinion of the dispersion of the a's, and 
*• a 

similarly, A». for the 6's and A for the y's. One suggestion is to rescale 
the regression variables x^. so that in the new scale, the prior 

opinions about scatter of all the are the same, then A^ will have a 
constant diagonal. However, once this scaling has been done, we are not 
free to make a new scaling sn that Aj^ (and A^) have constant diagonals. Hence, 
in general, this device is not available. Nevertheless, we conjecture that in 
many cases a common scaling might be reasonable. It would, for example, be 
a fairly f^ophisticated form of prior knowledge that thought that a^^ (the last 
suffix referri-^^. to x^) was uicre dispersed than a^^, whereas /^.^ was less, 
dispersed than B^^ : in other words, the rows would have more effect on Ou^ 
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regression coefficients for the seventh variable than on the fifth, whereas 
the situation would be reversed for the columns. The simplest assumption 
is, therefore, to suppose A^, Aj^, and A^, in a suitable scaling of the x's, 
all have constant diagonal entries, though possibly varying with the matrices— 
the interaction, for example, might, apriori, be judged smaller than the main 
effects. 

Having settled on the diagonal elements in A^, it remains to consider 

the off-diagonal values, or effectively, the correlAtions. There would seem 

to be no real objection to putting these zero, except, in one important case. 

As explained above, the first predictor variable" will typically be a constant 

to allow for the regression surface not passing through the oriRin. Tn this 

case 0 has rather a different status from the other O's, and in particular, 
ijl 

it is changed substantially by altering the origin of any predictor variable. 

(In the last paragraph, chfmges of scale were under discussion.) Jackson 

has suggested changing the ccigin of each genuine (that is, apart from the 

constant x^) predictor variable so that, apriori, the correlation between 

a and a (s ^ 1) is zero. But again, there is no necessity for this 
ii is 

origin being the same for the ct's as for the 6's or Y's. If we suppose, 

on ine lines of the argument in the last paragraph, that they are, then each 

of A , A. , and A may be taken to be multiples of the unit matrix, 
"a -b ~c 

This completes the theory of the two-way regression analysis. Kssentiallv 

2 

we have two groups of equations, (13) and (18), to solve for ('lip' ^ ' 

)- , r , and I . iThe a's and 6's that occur in (18c) and (18d) can be 
-a -b -c ~ ~ 

eliminated using (16).] We have to feed in v^, , and v^--we suggest 
putting them all equal to p— A^, A^, and A^— we suggest, after suitable 
changes of origin and scale for the genuine regression variables, putting 



13 



these all equal to a multiple of unit matrix, say = 6^1 . The data enter 
through the sufficient statistics X^jX^j , ^ij^ij* squares in 

(18a). Equations (13) and (18) are fairly involved and decidely non-linear. 
They, therefore, raise several problems in numerical analysis when we go to 
solve them. We only offer a few comments on a possible mode of solution. 

Notice that any computational procedure ought to output the inverse 
of the matrix on the left-hand side of (13) since this provides the dispersion 
matrix of the posterior distribution of (§^j)» exactly if the dispersions are 
known, approximately otherwise. We don't provide a similar dispersion matrix 
for the 5*s since this posterior is not well-understood. Indeed, we suspect 
that the estimates of the ?*s may not be too good, nevertheless, we believe 
that any errors here will nol seriously affect the estimation of the quantities 
of primary interest.-^ the 6*s . 

One possible way to solve these equations is to .start with the least- 
squares estimates for , to identify ot^ with 0^^ and 8^ with 0^^, (16) — 
this amounts to putting T.^ = 0— and then estimating o*", I^, I^, and from 
(18). With these values, solve (13) for 6^^ and obtain the a's and :;j's from 
(16). These new values can be inserted in (18) again and the cycle repeated. 
Another possibility is, instead of starting with the least-squares values, 
guess a value for o and solve (18) with this value and = (u = a , h , c) . 
From these values, the ot's and B's could be found with the same original 
guesses, and then the iterative procedure suggested above repeated. 
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